BMC Medical Genomics — Latest Matching Preprints

1

SCIA: A fast and widely applicable pipeline for measuring expanded repeat instability

Smith, C.; Peter Durairaj, R. R.; Randall, E. L.; Aston, A. N.; Heraty, L.; Elsayed, W.; Murillo, A.; Dion, V.

2026-03-15 neuroscience 10.64898/2026.03.12.707943 medRxiv

Top 0.1%

14.9%

Show abstract

The expansion of short tandem repeats is a feature of over 60 different human diseases. Ongoing somatic instability throughout a patients lifetime can influence disease progression and has emerged as a therapeutic target. Understanding its mechanism is essential for the identification of both drug targets and therapeutic interventions. A major obstacle towards this translational goal has been to measure changes in repeat size distribution in a timely manner. To address this, here we present Single Clone-based Instability Assay (SCIA), a streamlined experimental design that saves weeks in assessing the effect of a gene knockout on repeat instability. The approach avoids bulk cultures and does not require a reporter cell line. It uses targeted long-read sequencing as a readout for repeat instability. We have validated the approach using FAN1, PMS1, and MLH1 knockouts in HEK293-derived cells. We provide a visualization software that generates delta plots, extracts the instability frequency, the bias towards expansion or contraction, and the average size of the changes. Using SCIA, we find that although FAN1 knockout clones showed increased frequency of expansions, the size of the expansions were smaller. This highlights the wealth of information that can be extracted and the potential for novel insights into the mechanism of repeat instability.

2

Translational bioinformatics and machine learning framework for biomarker discovery, disease prediction, and patient profiling for precision medicine

Ahmed, Z.; Govindareddy, P.; DeGroat, W.; Narayanan, R.; Peker, E.; Zeeshan, S.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353961 medRxiv

Top 0.1%

12.3%

Show abstract

Precision medicine aims to advance our ability from a "one-size-fits-all" approach to personalized and predictive healthcare across diverse populations. It promotes integration of multi-omics and phenotypic data to understand disease mechanisms and discover novel biomarkers and risk factors, which could be used to predict and prevent critical diseases in individual patients across diverse populations. The potential implications of precision medicine approach can accelerate our ability to classify patients at higher risk of developing critical diseases, improve diagnostic capabilities, develop deeper understanding of individual risk, investigate racial differences and demographic characteristics, and find relationships between genetic variants, expressions, and diseases. This study focuses on implementing an innovative and data driven framework of translational bioinformatics and Machine Learning (ML) techniques to analyze multi-omics, including RNA-seq and Whole-Genome Sequencing (WGS) data, generated using blood samples of randomly consented patients. First, we utilized bioinformatics pipelines to identify differentially expressed genes and their pathogenic and likely pathogenic variants for the downstream data analysis, annotation, and visualization. Then, applied a nexus of ML models for multi-omics biomarker discovery, disease prediction, density-based clustering, single-patient profiling, and pathogenicity classification. WGS data analysis supported the exploration of genetic variation and diversity among patients to identify known and novel biomarkers, whereas RNA-seq data analysis improved our understanding of functional and biological pathways that underlying disease states. We classified and clustered pathogenic variants and expressions across various genes and discovered numerous diseases leading risk factors. Our results include gene-disease associations and captured common pathways across the broader population, demonstrating a level of sensitivity and accuracy that has broad clinical implications. We validated our results through clinical records, and state of the science literature. This study delves into the strengths of multi-omics data integration and capabilities of ML application in genetically diverse and complex patient cohorts. Our approach has the potential to elucidate complex gene-disease interactions for genetically diverse populations, which can support earlier diagnoses for patients in many disease realms.

3

Integration of single-cell and bulk RNA sequencing reveals TREM1 as a promising biomarker and therapeutic target for gouty arthritis

Jinfeng, W.; Jiarui, Z.; Hongbin, Q.

2026-05-20 public and global health 10.64898/2026.05.15.26353351 medRxiv

Top 0.1%

7.0%

Show abstract

Abstract: Objective This study aimed to systematically screen for potential candidate biomarkers and identify therapeutic targets associated with gouty arthritis (GA) through integrated analyses of single-cell and bulk RNA sequencing (RNA-seq) data. Methods The single-cell dataset GSE211783 and the bulk RNA-seq dataset GSE160170 were analyzed using a series of bioinformatic approaches, including cell clustering, differential expression analysis, immune cell infiltration assessment, protein-protein interaction network construction, gene set enrichment analysis, as well as drug sensitivity evaluation. To establish an animal model of GA, monosodium urate crystals were injected intra-articularly into experimental mice. Joint swelling was evaluated, and morphological changes in joint tissues were analyzed through hematoxylin-eosin staining. The presence of TREM1-positive cells was detected by immunohistochemistry and the level of TREM1 protein expression in joint tissues were assessed by Western blotting. Results We identified 102 differentially expressed genes (DEGs) and 14 signaling pathways associated with GA. The PPI network revealed 25 hub genes, of which 17 (including TREM1, TNF, PTGS2, and NLRP3) were highly expressed and 8 (including FCGR3B and CXCR6) showed low expression in the GA samples. These genes correlated significantly with the infiltration levels of macrophages. Among the hub genes, TREM1 was selected for further validation because it correlated significantly with all 14 differential pathways. In animal experiments, GA mice developed marked joint swelling and inflammatory tissue injury, along with a significant increase in TREM1-positive cells and TREM1 protein expression. Conclusion Integrative analysis of single-cell and bulk RNA-seq data identified 102 GA-related DEGs and 14 key pathways, from which 25 hub genes were screened. TREM1 is significantly upregulated in GA and may be linked to macrophage function, providing new insights into biomarker and therapeutic target discovery for GA.

4

Pulmonary Hypertension Engine for Linked Experiments (PHELEX): a platform for the re-analysis of public transcriptomic data related to pulmonary hypertension in both animal models, and humans.

Nandani, T.; Ott, B. P.; Balaratnam, P.; Archer, S. L.; Durbin, J.; Hindmarch, C. C. T.

2026-05-01 genomics 10.64898/2026.04.28.721394 medRxiv

Top 0.1%

6.9%

Show abstract

Pulmonary hypertension (PH) is a vasculopathy that results in elevated mean pulmonary arterial pressures over 20mmHg. Despite significant advances in research, PH still has a high mortality rate, and there is currently no cure for the disease. As with all biomedical fields, PH researchers have embraced the power of next generation technologies such as microarrays and RNA sequencing. Most of these data can be found on public repositories, which is usually a requirement for publication. While these repositories are rich sources of data, they require intermediate to advanced bioinformatics skills to access, download, and make these data useful. Here we present Pulmonary Hypertension Engine for Linked Experiments (PHELEX), which represents a comprehensive catalogue of all RNA sequencing data related to PH that is currently available on the Gene Expression Omnibus (GEO), hosted by the US National Centre for Biotechnology Information (NCBI). We identified 2,278 bulk RNA sequencing samples from human, mouse and rat, and built a searchable tool based on the metadata that is associated with each sample. PHELEX is a functional tool that allows selected studies to be highlighted, and parsed through Confidence, an analysis tool we have created, which will model the data based on user-defined classifiers, perform differential gene expression and pathway analysis, and present these data using standard graphics, and text-file results. PHELEX also allows PH researchers to cross-cut between discrete studies, facilitating de novo understanding of these data. As a robust searchable repository of genomic data, we hope that PHELEX will accelerate PH innovation and discovery, by allowing researchers to mine existing genomic data and thus better understand the molecular signatures that underpin PH.

5

Identification of drug candidates for rescue of SOX17 gene targets in pulmonary arterial hypertension

Vasilaki, E.; Akosman, B.; Song, S.; Walters, R.; Sharma, Y.; Pereira, M.; Keles, M.; Mykytyuk, N.; Maude, H.; Singh, N.; Field, G.; Ventetuolo, C. E.; Howard, L.; Aman, J.; Wilkins, M. R.; Klinger, J. R.; Zhao, L.; Cebola, I.; Liang, O.; Rhodes, C. J.

2026-05-21 pharmacology and toxicology 10.64898/2026.05.14.725284 medRxiv

Top 0.1%

6.2%

Show abstract

BackgroundBoth rare and common variants in the SRY-Box Transcription Factor 17 (SOX17) locus are associated with pulmonary arterial hypertension (PAH). SOX17 dysregulation leads to pulmonary artery endothelial cell (PAEC) dysfunction and the obstructive remodelling that characterises PAH. HypothesisImpaired SOX17 expression contributes to the pathogenesis of PAH. Restoring the function of SOX17 or its downstream targets using compounds that mimic its transcriptomic signature will rescue PAEC dysfunction and prevent PAH development. Methods and ResultsWe defined thousands of genes with direct SOX17 genomic binding sites and identified important potential binding partners, including ETS-transcription factors such as ERG by ChIP-seq in PAECs. Through the integration of three PAEC RNA-seq datasets involving overexpression and silencing of SOX17, we defined a robust SOX17 transcriptomic signature. In PAH patients, circulating plasma protein levels of 10 SOX17 signature genes were associated with the SOX17 common risk variants. This included EFNB2 and UNC5B; knockdown of these genes altered the viability and apoptosis of PAECs in response to TNF treatment. The drug-transcriptome database Connectivity Map (CMap) was used to predict novel potential therapeutic compounds to correct the SOX17 transcriptomic signature. Five compounds were selected for in vitro testing and were able to partially reinstate SOX17 target gene expression in PAECs. One compound, BX-912, was selected for in vivo testing as it corrected the levels of multiple target genes, including suppressing Runt-related transcription factor-1 (RUNX1). BX-912 blocked the development of pulmonary hypertension in mice lacking the SOX17 enhancer associated with human disease. ConclusionWe have demonstrated the therapeutic potential of targeting SOX17 in PAH through correction of its gene targets, identifying BX-912 as a lead compound with in vivo efficacy.

6

Isoquercetin treatment of mouse sickled red blood cells shows a discernible deformability and sickling phenotype

Owegie, O. C.; Hancco Zirena, I.; Penubothu, T.; Ghiran, I. C.; Yang, M.

2026-04-28 pharmacology and toxicology 10.64898/2026.04.24.720679 medRxiv

Top 0.1%

4.1%

Show abstract

IntroductionSickle cell disease is an inherited hemoglobinopathy with defective red cell deformability. The defective deformability promotes microvascular occlusion and subsequent vaso-occlusion in sickle cell disease patients. Previous studies have demonstrated that thiol isomerases, an endoplasmic reticulum-resident oxidoreductase that is released from vascular cells into the bloodstream, are present on red cell membrane and contribute to cellular dehydration and sickling. However, the role of membrane-bound thiol isomerases on sickled red blood cells is unclear. MethodsUsing red blood cells from Townes humanized sickle cell or non-sickled mice, we performed ektacytometry assay under shear using laser assisted optical rotational cell analyzer (LORRCA) to assess the effects of antagonizing thiol isomerases with isoquercetin and a functional blocking monoclonal antibody. The densitometric properties of sickled red blood cells in the presence of isoquercetin was also tested using magnetic levitation. ResultsThiol isomerase antagonism increased sickled red cell elongation, cellular dehydration and the diamagnetic signature compared to control treatment. ConclusionThiol isomerases may be involved in regulating sickled red blood cells mechanical properties through mechanisms that require further investigation.

7

Carbohydrate Metabolism Differs in Infants by Asthma-risk Status and is Associated with the Functional Potential of Bacteroides cellulosilyticus

Steininger, H. M.; Iglesias-Aguirre, C. E.; Panzer, A. R.; Durack, J.; McKean, M.; Cabana, M. D.; Diamond, S.; Lynch, S. V.

2026-05-04 microbiology 10.64898/2026.04.28.721144 medRxiv

Top 0.1%

4.0%

Show abstract

2.Childhood atopic disease is linked to delayed gut microbiome development and metabolic dysfunction, however microbial drivers remain unclear. To explore microbial correlates of asthma risk during a time of active gut microbiome development, we analyzed stool from 6-month-old infants at high asthma risk (HR) or healthy controls (HC), using Genome-resolved metagenomics (HR=7; HC=12) and untargeted metabolomics (HR=11; HC=15). We recovered 82 bacterial species-level metagenomic-assembled genomes (MAGs). Global Taxonomic composition did not differ by asthma risk. Anticipating that key differences might associate with specific genomes, a machine-learning approach pinpointed Bacteroides cellulosilyticus, Hungatella effluvii, and Enterocloster aldenensis as linked with asthma risk status. All three species were more abundant in HC infants and the B. cellulosilyticus genome was enriched for carbohydrate metabolism genes relative to other MAGs. Metabolomic profiling revealed variance associated with asthma risk (PERMANOVA, R2 =0.069, p=0.016). HR fecal metabolomes were enriched in simple sugars, whereas HC contained more nitrogenous compounds. Integrative genome-metabolic modeling of compounds that significantly differentiate asthma-risk groups revealed risk-dependent interactions with community-encoded metabolic potential (CEP), for arabinose and agmatine, whose fecal concentrations are linked with B. cellulosilyticus and H. effluvii functional traits respectively. These findings suggest that microbial-influenced metabolic differences associate with asthma risk at 6 months, with B. cellulosilyticus and H. effluvii emerging as candidate bacteria influencing this observed metabolic remodeling. 3. Impact statementLeveraging a random forest classifier, we identified three bacterial species (Bacteroides cellulosilyticus, Hungatella effluvii, and Enterocloster aldenensis) as distinguishing features enriched in healthy 6-month old infant microbiomes compared to those at high risk of asthma development (HR). We developed an approach to integrate metabolomics and metagenomic-derived microbiome community encoded potential (CEP) with clinical outcomes to identify fecal metabolites whose concentrations are likely to be influenced by the microbiome. Fecal arabinose concentrations were positively associated with CEP in healthy infants, but not in HR subjects who exhibited elevated concentrations irrespective of CEP. These data implicate microbial activity as a contributor to the concentration of this metabolite in healthy but not HR infants. With a leave-one-out-cross-validation, we identified B. cellulosilyticus as a contributor to fecal arabinose concentrations. Our data indicate that microbial functional deficits in HR infants is associated with altered gut metabolic dysfunction during microbiome maturation. 4. Data summaryDurack et. al [1] is the source of the metabolomics data utilized in this study. The authors confirm that all other supporting data, code and protocols have been provided within the article or through supplementary data files.

8

Evo 2 Predicts Cardiomyopathy-Associated Variants and Elucidates Their Underlying Mechanisms

kurozumi, a.; otsuka, n.; Masamichi, I.; kawakami, t.; Isagawa, T.; kodera, s.; takeda, n.

2026-05-17 genomics 10.64898/2026.05.15.725304 medRxiv

Top 0.1%

3.7%

Show abstract

BackgroundAlthough advances in next-generation sequencing have accelerated the identification of genetic variants in cardiomyopathy, interpreting variants of uncertain significance (VUS) remains a clinical challenge. Evo 2 is a high-resolution genomic artificial intelligence model capable of predicting pathogenicity across large sequence contexts and enabling mechanistic interpretation; however, its application in cardiovascular genetics is limited. Here, we evaluated the utility of Evo 2 for assessing the pathogenicity and underlying mechanisms of cardiomyopathy-associated variants. MethodsWe used Evo 2 to predict the pathogenicity of single-nucleotide variants in cardiomyopathy-related genes listed on ClinVar. We assessed the ability of the model to identify characteristic structural features in both coding and noncoding regions using internal representation such as embeddings, and to infer the molecular mechanisms of variants within these regions. ResultsEvo 2 demonstrated high predictive accuracy for pathogenicity, achieving an AUROC of 0.983 and an AUPRC of 0.915. Notably, sparse autoencoders (SAEs) from embeddings identified features corresponding to higher-order structural features, including coiled-coil and actin-binding domains characteristic of cardiomyopathy-related proteins, and accurately detected mutations known to disrupt these domains. The model recognized the binding motif of the cardiac-enriched transcription factor TBX5 with SAEs and accurately predicted a single-nucleotide polymorphism affecting TBX5 binding affinity after supervised fine-tuning. ConclusionsEvo 2 demonstrated strong performance for both predicting pathogenicity and extracting biological features of cardiomyopathy-associated variants. It may represent a powerful emerging tool for evaluating VUS in cardiovascular medicine.

9

Genomic network analysis links uveitis with systemic inflammatory diseases

Chau, K.; Allison, K.; Braithwaite, T.; Harley, I.; Hassman, L. M.

2026-03-26 ophthalmology 10.64898/2026.03.24.26349228 medRxiv

Top 0.1%

3.7%

Show abstract

ObjectiveTo determine whether uveitis shares genetic similarity with extraocular immune-mediated inflammatory diseases (IMIDs), we performed network analysis of putative causal genes associated with ocular inflammatory disease, IMIDs and eye-specific diseases, including age-related macular degeneration and monogenic disorders. MethodsWe identified putative causal genes for genome-wide significance variants from uveitis, IMIDs and ocular diseases using OpenTargets and published studies. To assess the gene-level pleiotropy between disease groups, we quantified the causal gene overlap between groups, and the Jaccard Similarity Indices for individual disease pairs. We then used a network approach to assess the molecular genetic similarity between diseases at a biological pathway level and comparative statistics to identify diseases with greater network similarity to uveitis. ResultsSeventy-five percent of the putative causal genes for uveitis are also causal for IMIDs, while no uveitis genes are shared with primary ocular disorders. Network analysis revealed that 1) uveitis genes are more closely networked with systemic IMIDs disease genes than with ocular-specific disease genes; and 2) significant network similarity links uveitis and specific IMIDs, such as ankylosing spondylitis and sarcoidosis. ConclusionsOverlapping causal genes and network similarity indicate that uveitis is predominantly an inflammatory disease, sharing genetic architecture with other IMIDs. Future studies aimed at dissecting genetic heterogeneity within uveitis may determine whether subgroups share common immune pathways that could nominate endotype-specific therapeutic approaches.

10

A proteomic polygenic score to identify IL-18 driven inflammatory bowel disease

Turchin, M. C.; Raghupathy, N.; Carty, C. L.; Morris, M.; Maranville, J. C.; Holzinger, E. R.

2026-05-21 genetic and genomic medicine 10.64898/2026.05.18.26353508 medRxiv

Top 0.1%

3.7%

Show abstract

High levels of IL-18 have been causally implicated in IBD risk and may represent a unique mechanism driving IBD yet to be therapeutically targeted. To identify individuals predisposed to increased levels of IL-18, we implemented a polygenic approach to predict IL-18 plasma protein levels. Using a dataset with over 50,000 individuals with both genetic and plasma protein levels from Olink, we developed a 27 SNP polygenic score that predicts IL-18 levels and IBD risk. Further, we identified a threshold to classify patients as 'IL-18 High' using a data-driven approach that optimized prediction of both IL-18 and IBD risk. We show that ~30% of the overall IBD patient population is 'IL-18 High', meaning a genetic predisposition towards higher protein levels. The IL-18 PGS and corresponding threshold have the potential to identify IBD patients with IL-18-driven IBD that may respond more effectively to a therapy targeting this mechanism.

11

Single-cell Landscape of T Cell Heterogeneity in Kawasaki Disease: STAT3/JAK Axis Regulates the Lineage Differentiation Bias of Th17 Cells

Song, S.; Zong, Y.; Xu, Y.; Chen, L.; Zhou, Y.; Chen, L.; Li, G.; Xiao, T.; Huang, M.

2026-03-23 bioinformatics 10.64898/2026.03.18.712795 medRxiv

Top 0.1%

3.7%

Show abstract

BackgroundKawasaki disease (KD) is a pediatric systemic vasculitis in which T-cell-mediated immune responses play a pivotal role. However, the precise dynamic evolution of T-cell subsets during disease progression remains poorly understood. MethodsSingle-cell RNA sequencing (scRNA-seq) was employed to perform high-resolution annotation of peripheral blood mononuclear cells (PBMCs) from healthy controls and KD patients, both pre- and post- IVIG treatment. T-cell developmental trajectories were reconstructed via Monocle3-based pseudotime analysis. Furthermore, the functional significance of the significant pathway was validated in a CAWS-induced KD murine model. ResultsA high-resolution single-cell landscape identified 13 distinct T-cell subtypes. Pseudotime analysis revealed a significant lineage commitment of CD4+ T cells toward a Th17 phenotype during the acute phase of KD, synchronized with the transcriptional upregulation of the STAT3/JAK signaling axis. Animal experiments further demonstrated that pharmacological inhibition of this pathway substantially attenuated inflammatory infiltration in the cardiac vasculature of KD mice. ConclusionThis study identifies the STAT3/JAK-mediated Th17 differentiation bias as a potential regulatory program associated with acute inflammation in Kawasaki disease, thereby highlighting the STAT3/JAK axis as a potential therapeutic target.

12

Comprehensive Profiling of Age- and Immune Cell- Specific Signaling Activation Using Multiplex Phosphoflow

Hadlova, P.; Svaton, M.; Kochmannova, K.; Korzhenevich, J.; Schmidt, F.; Neys, S. F. H.; Bott, M.-T.; Vrabcova, P.; Staniek, J.; Bloomfield, M.; Kalina, T.; Rizzi, M.

2026-05-27 immunology 10.64898/2026.05.24.727113 medRxiv

Top 0.1%

3.7%

Show abstract

Immune phenotyping represents a pillar in diagnostics, characterization of new genetic defects, and understanding mechanisms of diseases. Cell population distribution often does not cover the intrinsic function changes that may contribute to disease. Outcome of signaling activation can be used as proxy for cell function. To overcome the limitation of sample availability and standardization of signaling assays, we developed a multiplex full spectrum cytometry phosphoflow assay allowing the study of 6 phospho-proteins representing BCR/TCR, MAPK, PI3K/Akt/mTOR and canonical NF-{kappa}B signaling pathways in 18 immune cell subpopulations. Maximal stimulation and temporal dynamics were studied in response to pan-stimuli, activating cells regardless of receptor, and targeted stimuli for T, B, and innate immune cells. We studied healthy individuals between 1-69 years and discovered subpopulations-specific responses. Furthermore, pediatric donors showed broad differences in B cell and T cell function compared to adults. Hence, we established a tool to assess multiple signaling pathways at once and provide age- and subpopulation-specific references for signaling outcome. SummaryMultiplex full spectrum flow cytometry-based phosphoflow assay across 18 immune cell subpopulations, 6 phospho-proteins in response to 6 stimuli at 4 time points in individuals aged 1-69 years, reveals distinct age- and subpopulation-associated signaling patterns in magnitude and dynamics of pathways activation.

13

Integrative multi-cohort analysis reveals consistent sex differences in gut microbiota of multiple sclerosis patients

Soler-Saez, I.; Galiana-Rosello, C.; Grillo-Risco, R.; Falony, G.; Tepav?evi?, V.; Vieira Silva, S.; Garcia-Garcia, F.

2026-04-22 neuroscience 10.64898/2026.04.17.719247 medRxiv

Top 0.1%

3.6%

Show abstract

Biological sex is a key determinant in the onset and progression of multiple diseases. In multiple sclerosis (MS), females exhibit higher disease prevalence, earlier onset, and more pronounced inflammatory activity, whereas males tend to experience a more severe neurodegenerative course, characterized by accelerated central nervous system damage and increased brain atrophy. The gut microbiome has emerged as a critical factor in MS, as its composition can either ameliorate or exacerbate disease progression. In this study, we aimed to identify reproducible sex-associated differences in gut microbial composition across independent cohorts of MS patients. Through a systematic search we identified six independent studies based on 16S rRNA gene sequencing, comprising a total of 337 samples. Despite substantial inter-study variability, sex-associated differences were more pronounced in MS patients than in healthy controls. We identified 11 microbial taxa showing significant sex-associated differences in MS, nine enriched in females and two in males. Notably, the female-enriched taxa Eggerthella and Eisenbergiella were associated with specific MS subtypes and higher disability. To facilitate the use of our findings by the scientific community, we developed a freely accessible web-based tool that provides full access to our results. Thus, in this work we identified consistent and reproducible sex differences in the gut microbiota of MS patients, highlighting the importance of incorporating sex as a critical variable in microbiome research, with potential implications for understanding disease heterogeneity in MS. IMPORTANCEMultiple sclerosis (MS) affects females and males differently, but the biological reasons behind these differences are not fully understood. One potential factor is the gut microbiome (i.e., the community of microorganisms living in our intestines) which can influence immune function and disease progression. In this study, we analyzed data from multiple independent cohorts and found consistent differences in gut microbial composition between female and male MS patients. Notably, certain bacteria were more abundant in females and were linked to more severe disease features. We also developed a freely accessible web tool where researchers can explore the complete findings in detail. Our results highlight the importance of considering sex as a key factor in microbiome research and may help guide more personalized approaches to understanding and treating MS.

14

Integrated luminescence and phenotypic profiling for drug discovery in a zebrafish model of Marfan syndrome

Horvat, M.; Caboor, L.; De Rycke, K.; Mennens, L.; Daniels, E.; Wyseur, J.; Verhelst, E.; Roos, I.; Rodriguez-Rovira, I.; Egea, G.; De Backer, J.; Sips, P.

2026-05-13 pharmacology and toxicology 10.64898/2026.05.12.722859 medRxiv

Top 0.1%

3.6%

Show abstract

BackgroundMarfan syndrome (MFS) is a life-threatening heritable connective tissue disorder caused by pathogenic variants in fibrillin-1, characterized by progressive cardiovascular disease. Current medical therapies slow disease progression but do not prevent major complications, underscoring the need for new treatment strategies and unbiased discovery approaches. MethodsWe used a zebrafish model of MFS lacking fibrillin-3 (fbn3-/-), which recapitulates key cardiovascular phenotypes including cardiac stress, valvular defects, arrhythmia, and aortic dilation. To enable sensitive, quantitative assessment of cardiac stress, we generated a novel transgenic zebrafish reporter expressing secreted nanoluciferase under control of the stress-responsive nppb promoter. This reporter was combined with morphological phenotyping and bulbus arteriosus (BA) imaging. We evaluated standard MFS therapies, targeted modulators of TGF-{beta} signaling, and performed an unbiased high-throughput drug screen of over 1 500 clinically approved compounds across multiple developmental treatment windows. Resultsfbn3-/- larvae exhibited markedly elevated nppb activity that correlated with phenotypic severity and peaked during stages of highest mortality. The nanoluciferase reporter provided a [~]1 000-fold dynamic range, substantially outperforming Firefly luciferase-based assays. Pharmacological inhibition of TGF-{beta} signaling produced transient or deleterious effects, while {beta}-blockers, losartan, and allopurinol failed to consistently improve cardiac stress, pericardial edema, or BA dilation. The unbiased high-throughput drug screen identified a small number of primary and secondary hits; however, none demonstrated reproducible phenotypic rescue upon rigorous multi-dose, multi-time window validation. ConclusionsThis study establishes a sensitive zebrafish-based platform for early, quantitative assessment of cardiovascular stress in MFS. Our findings highlight the limited efficacy of current therapies, the context-dependent nature of TGF-{beta} modulation, and the biological complexity underlying MFS pathogenesis. Although no definitive therapeutic candidates were identified, this work lays a robust foundation for expanded unbiased discovery efforts aimed at identifying disease-modifying interventions for MFS.

15

From SNPs to Pathways: A genome-wide benchmark of annotation discrepancies and their impact on protein- and pathway-level inference

Queme, B.; Muruganujan, A.; Ebert, D.; Mushayahama, T.; Gauderman, W. J.; Mi, H.

2026-03-24 bioinformatics 10.64898/2026.03.21.713397 medRxiv

Top 0.2%

3.4%

Show abstract

BackgroundAccurate single-nucleotide polymorphism (SNP) annotation is central to genomic research yet widely used tools and gene models often yield divergent results. Prior studies have shown such discrepancies in small datasets, but the extent of genome-wide variation and its impact on downstream pathway analysis remain unclear. ResultsWe conducted a comprehensive comparison of three commonly used SNP annotation tools, ANNOVAR, SnpEff, and VEP, using both Ensembl and RefSeq gene models to evaluate more than 40 million SNPs from the Haplotype Reference Consortium. At the protein level, annotation output differed significantly across tools and gene models (p-adj < 0.001), with discrepancies present in both genic and intergenic regions. RefSeq produced broader annotation coverage, particularly for intergenic SNPs, while Ensembl showed greater internal consistency. SnpEff provided the most complete coverage overall, whereas no single tool or model configuration achieved full annotation recovery of the union reference. Integration across tools and models maximized coverage and reduced annotation loss. In a case study of 204 colorectal cancer-associated SNPs from the FIGI GWAS, pathway enrichment results varied depending on annotation strategy. The fully integrated approach identified all four significant pathways, whereas several single-tool or single-model strategies missed one or more. ConclusionSNP annotation outcomes are influenced by both the tool and gene model used, and relying on a single approach may result in incomplete coverage. A multi-tool, multi-model strategy provides the most comprehensive annotation and preserves enriched pathways, supporting more robust and reproducible genomic interpretation.

16

Lysophosphatidic Acid (LPA) Salivary Species Detection and Whole-mount LPA Receptor Localization in Mouse Salivary Gland

Cerutis, D. R.; Kumar, D.; Nichols, M. G.; Roemer, G. R.; Fluent, M. E.; Miyamoto, T.; Alnouti, Y.

2026-05-01 pharmacology and toxicology 10.64898/2026.04.28.721492 medRxiv

Top 0.2%

3.3%

Show abstract

This study builds on our previous findings on the role of salivary lysophosphatidic acid (LPA) species in humans to investigate their presence, together with salivary gland LPA receptor (LPAR) expression in a Porphyromonas gingivalis-infected murine (C57BL/6J) model of periodontal disease (PD). Utilizing LC-MS/MS for LPA analysis alongside confocal LPAR imaging and second harmonic (SHG) imaging for collagen visualization, we compared mouse salivary LPA levels and gland LPAR expression to previously established human and mouse data. The findings reveal that while healthy mouse saliva maintains low homeostatic LPA levels, PD triggers an [~] 10-fold increase, mirroring the elevation we observed in PD patients. Furthermore, the study confirmed the presence of LPA1, LPA3, and LPA4 within submandibular gland (SMG) tissue. Notably, LPA3 was identified as the most widely distributed subtype, while providing the first evidence of LPA4 expression in adult mouse salivary glands. The presence of multiple LPARs suggests that LPA signaling is a critical factor in salivary gland biology. The documented existence of multiple LPARs within salivary glands indicates that they must be taken into consideration in future research concerning autoimmune conditions, and in pharmacological studies involving drugs that impact salivary gland biology and secretory function.

17

Large-scale association study identifies lung cancer susceptibility copy number variants and their potential functional role in genetic instability

Xiao, F.; Qin, F.; Luo, X.; Slewitzke, S. E.; Fernandes, G. F.; Johansson, M.; Xiao, X.; Zaridze, D.; Bojesen, S. E.; Shete, S.; Albanes, D.; Aldrich, M. C.; Tardon, A.; Fernandez-Tardon, G.; Le Marchand, L.; Rennert, G.; Bickeböeller, H.; Wichmann, H.-E.; Risch, A.; Muley, T.; Rosenberger, A.; Field, J. K.; Davies, M.; Woll, P.; Kiemeney, L. A.; Haugen, A.; Zienolddiny, S.; Lam, S.; Johansson, M.; Grankvist, K.; Schabath, M. B.; Andrew, A.; Lazarus, P.; Arnold, S. M.; Zhu, D.; Brenner, H.; Neuhouser, M. L.; Hung, R. J.; Christiani, D. C.; McKay, J.; Cai, G.; Xia, J.; Amos, C. I.

2026-05-15 genetic and genomic medicine 10.64898/2026.05.11.26352741 medRxiv

Top 0.2%

3.3%

Show abstract

Background: Genome-wide association studies (GWAS) have identified numerous lung cancer susceptibility loci based on single nucleotide polymorphisms (SNPs), yet a substantial proportion of heritability remains unexplained. We therefore evaluated germline copy number variants (CNVs) as an underexplored source of genetic susceptibility and potential contributors to genomic instability in lung cancer. Methods: We conducted a genome-wide analysis of germline CNVs using 19,342 cases and 15,917 controls from the Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, with replication in two independent cohorts. High-confidence CNVs were identified by integrating two CNV callers including PennCNV and modSaRa2. Association analyses were performed using both gene-based and CNV region-based approaches. Polygenic risk scores (PRS) were constructed from top loci, and functional validation was conducted using siRNA-mediated knockdown in lung fibroblast cells. Results: We identified CNVs in four genomic regions (1p36.22, 2q31.2, 6p21.32, and 19q13.32) significantly associated with lung cancer risk. Two loci (1p36.22 and 2q31.2) were consistently supported across both analytical strategies. A CNV-based PRS constructed from key genes (CLCN6, NFE2L2, OPA3, and PSMB8) was significantly associated with lung cancer risk and replicated across independent datasets. Functional assays demonstrated that knockdown of NFE2L2 and OPA3 increased endogenous DNA damage, supporting a role in genomic stability. Conclusions: Germline CNVs contribute to lung cancer susceptibility and may influence carcinogenesis through mechanisms related to genomic instability. Impact: These findings expand the genetic architecture of lung cancer and highlight CNVs as potential biomarkers for improving risk stratification and informing precision prevention strategies.

18

Disentangling the Shared and Differential Genetic Architecture Between COVID-19 and Other Respiratory Disorders: A Multi-Omics Genome-Wide Analysis

Xue, X.; LIN, Y.-P.; FENG, Y.; SO, H.-C.

2026-03-26 genetic and genomic medicine 10.64898/2026.03.21.26348591 medRxiv

Top 0.2%

2.7%

Show abstract

BackgroundA bidirectional relationship has been observed between COVID-19 and respiratory disorders, where respiratory comorbidities increase severity and COVID-19 induces respiratory sequelae. The underlying biological and genetic mechanisms remain unclear. While previous studies have identified overlapping genetic loci, few have systematically disentangled the genetic factors shared between these conditions versus those specific to COVID-19, particularly at a multi-omics level. MethodsWe developed and applied a unified analytical framework to compare three COVID-19 phenotypes with eight respiratory disorders (including asthma, COPD, IPF, and pneumonia). Utilizing the cofdr method for shared genetic signal analysis and DDx/mtCOJO for differentiation, we integrated genome-wide association statistics with multi-omics data (transcriptome, splicing, and proteome). This approach allowed for the simultaneous identification of shared genetic signals (concordant or discordant) and disease-specific variants across expression (TWAS), alternative splicing (spTWAS), and protein abundance (PWAS). ResultsWe delineated a comprehensive atlas of 214 differential and numerous shared loci across 24 pairwise comparisons. The shared genetic architecture was characterized by pleiotropic effects in genes such as ATP11A (exhibiting opposing effects in COVID-19 vs. IPF) and GSDMB (shared with COPD). Crucially, differentiation analysis revealed that severe COVID-19 is genetically distinct from other respiratory infections (e.g., pneumonia and influenza) through dysregulated Type I/III interferon signaling and specific defects in alveolar epithelial and macrophage function, as well as GM-CSF/surfactant metabolism pathways. These findings provide direct genetic evidence supporting the use of GM-CSF modulators and interferon-lambda for COVID-19 treatment, therapies that have already entered clinical trials. Furthermore, multi-trait conditional analysis prioritized FYCO1 and HCN3 as potential COVID-19-specific risk genes. Splicing analysis underscored the critical role of alternative splicing in both shared and differential architectures, highlighting IFNAR2 isoform regulation as a key discriminator between COVID-19 and other respiratory traits. ConclusionThis study provides the first genome-wide, multi-omics map revealing the shared and differential genetic landscapes of COVID-19 and other respiratory phenotypes. By uncovering specific molecular mechanisms that distinguish COVID-19 pathology, specifically involving surfactant homeostasis and interferon pathways, our findings offer novel insights for targeted drug repurposing and precision risk stratification.

19

DxFit: An ensemble method for identifying EHR diagnoses consistent with a molecular finding

Torene, R. I.; Meltz Murphy, K.; Brandt, T.; Retterer, K.

2026-04-28 genomics 10.64898/2026.04.24.720629 medRxiv

Top 0.2%

2.7%

Show abstract

As population DNA sequencing becomes more common, genomic-first approaches are increasingly used to identify individuals with possible rare genetic disorders. To accurately estimate prevalence and penetrance, these studies often confirm manifestation of the disorder using electronic health records (EHRs). Multiple strategies exist to search the EHR for diagnoses of rare disorders, however, each has its limitations. We have developed a portable, ensemble tool, DxFit, that mines EHR data (ICD codes and structured diagnosis descriptions from billing code and problem list tables) for a diagnosis consistent with a given rare genetic disorder. DxFit combines evidence across four strategies: (1) gene name searches in diagnosis descriptions and notes, (2) ICD conversion to Mondo rare disorder ontology to find exact and nearby matches, (3) word embedding similarity searches, and (4) Jaccard similarity matches. DxFit prioritizes the match type and outputs the most confident match for each participant-disorder pair. On a cohort of 350 participants with a known positive result from diagnostic genetic testing for developmental disorders, DxFit had a sensitivity of 88.7% and specificity of 86.2% using default parameters. Adjusting the linguistic scoring thresholds from 0.8 to 0.7 and allowing for synonymous matches yielded a sensitivity of 92.7% and specificity of 84.5%. Partitioning EHR evidence into windows before and after genetic testing demonstrates, as expected, that the overall DxFit rates increase after testing and the match types become more confident. DxFit is available to the public and has extensive customization options to support a wide range of uses. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=187 HEIGHT=200 SRC="FIGDIR/small/720629v1_ufig1.gif" ALT="Figure 1"> View larger version (41K): org.highwire.dtl.DTLVardef@d71d00org.highwire.dtl.DTLVardef@b11a9eorg.highwire.dtl.DTLVardef@14a9304org.highwire.dtl.DTLVardef@fa23aa_HPS_FORMAT_FIGEXP M_FIG C_FIG

20

Bacterial Virulence Genes Detected by Metagenomic Sequencing in the Cystic Fibrosis Airway Microbiome

Valluri, M. L.; Harmon, B.; Burrell, A.; Hahn, A.

2026-05-19 microbiology 10.64898/2026.05.19.726200 medRxiv

Top 0.2%

2.6%

Show abstract

BackgroundCystic fibrosis (CF) is an autosomal recessive genetic disorder that leads to chronic infection and mucus retention in the lungs, with lung function gradually deteriorating through recurrent pulmonary exacerbations (PEx). Virulence factors (VFs) of Pseudomonas aeruginosa and Staphylococcus aureus are thought to contribute to pulmonary exacerbations. Our study objective was to identify VF genes related to PEx, high Pseudomonas abundance, and high Staphylococcus abundance in persons with CF (pwCF). MethodsThis was an ancillary study of pwCF treated with IV antibiotics for PEx between 2016-2020 at Childrens National Hospital. Using shotgun metagenomics and ShortBRED, we identified bacterial VF genes and used DESeq2 to determine differential expression of VF genes across comparators. ResultsTwenty-two PwCF experienced 43 PEx. The study cohort had a mean age of 14.6 years, 41% female, 59% white, 36% Hispanic, and 45% had an F508del homozygous CFTR mutation. Minimal differences in VF gene abundance were identified across clinical state. The most differentially increased VF genes found in Pseudomonas high samples were associated with an aminotransferase (log2FC 25.9), flagellar biosynthesis (log2FC 8.3), and type VI secretion systems (log2FC 8.2). The most differentially increased VF genes found in Staphylococcus high samples were an exotoxin (log2FC 26.7), macrolide phosphotransferase (log2FC 25.8), pathogenicity island proteins (log2FC 25.2 and 24.7), and VOC family proteins (log2FC 24.8). ConclusionsThese findings demonstrate that specific VFs associated with immune modulation, motility secretion systems, bacterial motility, and antibiotic resistance are related to P. aeruginosa and S. aureus abundance, providing potential targets for more personalized antimicrobial interventions.